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Abstract 

We consider the unsupervised alignment of 
the full text of a book with a human-written 
summary. This presents challenges not seen 
in other text alignment problems, including 
a disparity in length and, consequent to this, 
a violation of the expectation that individual 
words and phrases should align, since large 
passages and chapters can be distilled into 
a single summary phrase. We present two 
new methods, based on hidden Markov mod- 
els, specifically targeted to this problem, and 
demonstrate gains on an extractive book sum- 
marization task. While there is still much 
room for improvement, unsupervised align- 
ment holds intrinsic value in offering insight 
into what features of a book are deemed wor- 
thy of summarization. 

1 Introduction 

The task of extractive summarization is to select 
a subset of sentences from a source document to 
present as a summary. Supervised approaches to 
this problem make use of training data in the form 
of source documents paired with existing summaries 



(Marcu, 1999; Osborne, 2002 


Jing and McKeown, 


19991 ICeylan and Mihalcea, 2009 


1. These methods 



learn what features of a source sentence are likely 
to result in that sentence appearing in the summary; 
for news articles, for example, strong predictive fea- 
tures include the position of a sentence in a docu- 
ment (earlier is better), the sentence length (shorter 
is better), and the number of words in a sentence that 
are among the most frequent in the document. 

Supervised discriminative summarization relies 
on an alignment between a source document and 



its summary. For short texts and training pairs 
where a one-to-one alignment between source and 
abstract sentences can be expected, standard tech- 
niques from machine translation can be applied, in- 



cluding word-level alignment (Brown et al., 1990 



Vogel et al, 1996} |Qch and Ney, 2003| l and longer 



phrasal ahgnment ( [Daume and Marcu, 2005| , espe 



cially as adapted to the monolingual setting (Quirk 
et al., 2004] ). For longer texts where inference over 



all possible word alignments becomes intractable, 
effective approximations can be made, such as re- 
stricting the space of the available target alignments 
to only those that match the identity of the source 
word ( [Jing and McKeown, 1999| ). 

The use of ahgnment techniques for book summa- 
rization, however, challenges some of these assump- 
tions. The first is the disparity between the length of 
the source document and that of a summary. While 
the ratio between abstracts and source documents 
in the benchmark Ziff-Davis corpus of newswire 



(Marcu, 1999 1 is approximately 12% (133 words vs. 
1,066 words), the length of a full-text book greatly 
overshadows the length of a simple summary. Figure 
[T] illustrates this with a dataset comprised of books 
from Project Gutenberg paired with plot summaries 
extracted from Wikipedia for a set of 439 books (de- 
scribed more fully in ^4.1| below). The average ratio 
between a summary and its corresponding book is 
1.2%. 

This disparity in size leads to a potential violation 
of a second assumption: that we expect words and 
phrases in the source document to align with words 
and phrases in the target. When the disparity is so 
great, we might rather expect that an entire para- 
graph, page, or even chapter in a book aligns to a 
single summary sentence. 



method of Jing and McKeown ( 1999 1 to a set of 31 
literary novels. 
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Figure 1: Size disparity between summaries and full 
texts. Summaries average 1% the size of the correspond- 
ing book. The mean is 0.012, with a [5, 95] quantile of 
[0.002,0.032]. 

To help adapt existing methods of supervised 
document summarization to books, we present two 
alignment techniques that are specifically adapted to 
the problem of book alignment, one that aligns pas- 
sages of varying size in the source document to sen- 
tences in the summary, guided by the unigram lan- 
guage model probability of the sentence under that 
passage; and one that generalizes the HMM align- 



ment model of Och and Ney (2003 1 to the case of 
long but sparsely aligned documents. 

2 Related Work 

This work builds on a long history of unsupervised 
word and phrase alignment originating in the ma- 
chine translation literature, both for the task of learn- 



ing alignments across parallel text (Brown et al., 
T990l|Vogel et al., 1996tpch and Ney, 2003) [DeN^ 
ero et al., 2008 1 ) and between monolingual (Quirk 



et al., 20041 and comparable corpora (Barzilay and 
E lhadad, 2003| ). For the related task of docu- 



ment/abstract alignment, we draw on work in docu- 
ment summarization ( Marcu, 1999t [Osborne, 2002^ 



Daume and Marcu, 2005 1. Past approaches to fic- 
tional summarization, including both short stories 



(Kazantseva and Szpakowicz, 2010 1 and books (Mi 



halcea and Ceylan, 2007 1, have tended toward non- 
discriminative methods; one notable exception is 
Ceylan ( |2011| ), which applies the Viterbi alignment 



3 Methods 

We present two methods, both of which involve es- 
timating the parameters of a hidden Markov model 
(HMM). The HMMs differ in their definitions of 
states, observations, and parameterizations of the 
emission distributions. We present a generic HMM 
first, then instantiate it with each of our two models, 
discussing their respective inference and learning al- 
gorithms in turn. 

Let S be the set of hidden states and K = \S\. An 
observation sequence t = {ti, . . . ,tn), each te G V, 
is assigned probability: 



Pit 



n] 



(1) 



where z is the sequence of hidden states, tt G Ak 
is the distribution over start states, and for all s G 
S, r]g G A|i;| and 7^ G Ak are s's emission and 
transition distributions, respectively. Note that we 
avoid stopping probabilities by always conditioning 
on the sequence length. 

3.1 Passage Model 

In the passage model, each HMM state corresponds 
to a contiguous passage in the source document. 
The intuition behind this approach is the following: 
while word and phrasal alignment attempts to cap- 
ture fine-grained correspondences between a source 
and target document, longer documents that are dis- 
tilled into comparatively short summaries may in- 
stead have long, topically coherent passages that are 
summarized into a single sentence. For example, 
the following summary sentence in a Wikipedia plot 
synopsis summarizes several long episodic passages 
in The Adventures of Tom Sawyer. 

After playing hooky from school on 
Friday and dirtying his clothes in a fight, 
Tom is made to whitewash the fence as 
punishment all of the next day. 

Our aim is to find the sequence of passages in the 
source document that aligns to the sequence of sum- 
mary sentences. Therefore, we identify each HMM 



Passage model 



Token model 



states S source document passages source document tokens 

observations summary sentences summary tokens 

transitions by passage order difference by distance bin 

emissions unigram distribution lexical identity, synonyms 



Table 1: Summary of the passage model ( ^3.1[ ) and the token model ( ^3.2| i. 



state in s G 5 with source document positions ig and 
js- When a summary sentence ti = (t^.i, . . . , te^Xe) 
is sampled from state s, its emission probability is 
defined as follows: 



k=l 



where bi^-j^ is the passage in the source document 
from position is to position jg; again, we avoid a 
stop symbol by implicitly assuming lengths are fixed 
exogenously. The unigram distribution p„„jgram(' I 
bi^:jj is estimated directly from the source docu- 
ment passage bi^-^j^. 

The transition distribution from state s £ 5, 7^ is 
operationalized following the HMM word alignment 



formulation of Vogel et al. (19961. The transition 
events between ordered pairs of states are binned 
by the difference in two passages' ranks within the 
source documentj^ We give the formula for relative 
frequency estimation of the transition distributions: 



Is 



c{s' - s) 



Es"e5 c(s - s") 



(3) 



where c(-) denotes the count of jumps of a particular 
length, measured as the distance between the rank 
order of two passages within a document; the count 
of a jump between passage 10 and passage 13 is the 
same as that between passage 21 and 24; namely, 
c(3). Note that this distance is signed, so that the 
distance of a backwards jump from passage 13 to 
passage 10 (—3) is not the same as a jump from 10 
to 13 (3). 

The HMM states' spans are constrained not to 
overlap with each other, and they need not cover 
the source document. Because we do not know 



These ranks are fixed; our inference procedure does not al- 
low passages to overlap or to "leapfrog" over each other across 
iterations. 



the boundary positions for states in advance, we 
must estimate them alongside the traditional HMM 
parameters. Figure [2] illustrates this scenario with 
a sequence of 17 words in the source document 
([1 . . . 17]) and 4 sentences in the target summary 
({a, b, c, d}). In this case, the states correspond to 
[1... 4], [9... 13], and [15... 17]. 

3.1.1 Inference 

Given a source document b and a target summary 
t, our aim is to infer the most likely passage Z£ for 
each sentence t£. This depends on the parameters 
(tt, 77, and 7) and the passages associated with each 
state, so we estimate those as well, seeking to max- 
imize likelihood. Our approach is an EM-like algo- 
rithm ( Dempster et al., 1977 ); after initialization, it 
iterates among three steps: 

• E-step. Calculate p{i) and the posterior distri- 
butions q{zj^ I t) for each sentence t^. This is 
done using the forward-backward algorithm. 

• M-step. Estimate tt and 7 from the posteriors, 
using the usual HMM M-step. 



S-step. Sample new passages for each state. 
The sampling distribution considers, for each 
state s, moving is subject to the no-overlapping 
constraint and js, and then moving jg subject to 
the no-overlapping constraint and is ( [DeNero 
et al., 2008| ). (See ^3T2]for more details.) The 



emission distribution rjs is updated whenever 
is and js change, through Equation [2j 

For the experiments described in section |4j each 
source document is initially divided into K equal- 
length passages (K = 100), from which initial emis- 
sion probabilities are defined; tt and 7 are both ini- 
tialized to uniform distribution. Boundary samples 
are collected once for each iteration, after one E step 
and one M step, for a total of 500 iterations. 
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Figure 2: Illustration of the passage HMM. HMM states correspond to passages in the source document (top); each 
emission is a summary sentence (bottom). 



3.1.2 Sampling chunk boundaries 



in the summary: 



During the S-step, we sample the boundaries of 
each HMM state's passage, favoring (stochastically) 
those boundaries that make the observations more 
likely. We expect that, early on, most chunks will be 
radically reduced to smaller spans that match closely 
the target sentences aligned to them with high prob- 
ability. Over subsequent iterations, longer spans 
should be favored when adding words at a bound- 
ary offsets the cost of adding the non-essential words 
between the old and new boundary. 

A greedy step — analogous to the M-step use to 
estimate parameters — is one way to do this: we 
could, on each S-step, move each span's boundaries 
to the positions that maximize likelihood under the 
revised language model. Good local choices, how- 
ever, may lead to suboptimal global results, so we 
turn instead to sampling. Note that, if our model 
defined a marginal distribution over passage bound- 
ary positions in the source document, this sampling 
step could be interpreted as part of a Markov Chain 



Monte Carlo EM algorithm ( |Wei and Tanner, 19901 ). 
As it is, we do not have such a distribution; this 
equates to a fixed uniform distribution over all valid 
(non-overlapping) passage boundaries. 



The implication is that the probability of a partic- 
ular state s's passage's start- or end-position is pro- 
portional to the probability of the observations gen- 
erated given that span. Following any E-step, the as- 
signment of observations to s will be fractional. This 
means that the likelihood, as a function of particular 
values of ig and jg, depends on all of the sentences 



n / le 



(4) 

q(zi=s\t) 



IT ( Vunigram (ti,k \ ^ia'.js 



=1 \A;=1 



For example, in Figure |2j the start position of 
the second span (word 9) might move anywhere 
from word 5 (just past the end of the previous span) 
to word 12 (just before the end of its own span, 
js = 12). Each of the values should be sampled 
with probability proportional to Equation |4j so that 
the sampling distribution is: 



1 



(L(5,12),L(6,12),...,L(12,12)) 



Calculating L for different boundaries requires 
recalculating the emission probabilities r]s,te as the 
language model changes. We can do this efficiently 
(in linear time) by decomposing the language model 
probability. Here we represent a state s by its bound- 
ary positions in the source document, i : j, and we 
use the relative frequency estimate for Punigram- 



log ?7i:j- = ^ log 



freq{ti,y, h 



k=l 



(5) 



-Ti log(j - ^ + 1) + X] ^°Sfreq{te,k; h-.j] 

(6) 



fe=i 



Now consider the change if we remove the first word 
from s's passage, so that its boundaries are [i + 1, j]. 



Let bi denote the source document's word at position 

i. log r]i+i:j,ti = 



-Te log(j - ^) + ^ log freq{te^k; bi+i-j) 

f req{bi;bi.j) - 1 
freq{bi;bi;j) 



k=l 

logT/i:j-t, + freq(bi;tg) log' 
j-i + l 



+ Tilog- 



J - i 



(V) 



This recurrence is easy to solve for all possible left 
boundaries (respecting the no-overlap constraints) 
if we keep track of the word frequencies in each 
span of the source document — something we must 
do anyway to calculate Punigmm- A similar recur- 
rence holds for the right boundary of a passage. 

Figure |3]illustrates the result of this sampUng pro- 
cedure on the start and end positions for a single 
source passage in Heart of Darkness. After 500 it- 
erations, the samples can be seen to fluctuate over 
a span of approximately 600 words; however, the 
modes are relatively peaked, with the most likely 
start position at 1613, and the most likely end po- 
sition at 1660 (yielding a span of 47 words). 




1000 1200 1400 1600 1800 2000 
Position in source document 

Figure 3: Density plot of accumulated samples for one 
passage HMM state, in Heart of Darkness. The left 
boundary is shown in black and solid, the right bound- 
ary in red and dashed. 

3.2 Token Model 

Jing and McKeown (1999 1) introduced an HMM 
whose states correspond to tokens in the source doc- 



ument. The observation is the sequence of target 
summary tokens (restricting to those types found 
in the source document). The emission probabil- 
ities are fixed to be one if the source and target 
words match, zero if they do not. Hence each in- 
stance of t> E V in the target summary is assumed 
to be aligned to an instance of v in the source. The 
transition parameters were fixed manually to simu- 
late a ranked set of transition types (e.g., transitions 
within the same sentence are more likely than transi- 
tions between sentences). No parameter estimation 
is used; the Viterbi algorithm is used to find the most 
probable alignment. The allowable transition space 
is bounded by F^, where F is the frequency of the 
most common token in the source document. The re- 
sulting model is scalable to large source documents 
( |Ceylan and Mihalcea, 20091 |Ceylan, 20TT] ). 

One potential issue with this model is that it 
lacks the concept of a null source, not articulated 



in the original HMM alignment model of Vogel et 



al. (1996 1 but added by Qch and Ney (2003 1. With- 



out such a null source, every word in the summary 
must be generated by some word in the source doc- 
ument. The consequence of this decision is that 
a Viterbi alignment over the summary must pick a 
perhaps distant, low-probability word in the source 
document if no closer word is available. Addition- 
ally, while the choice to enforce lexical identity con- 
strains the state space, it also limits the range of lex- 
ical variation captured. 

Our second model extends Jing's approach in 
three ways. 

First, we introduce parameter inference to learn 
the values of start probabilities and transitions that 
maximize the likelihood of the data, using the EM 
algorithm. We operationalize the transition proba- 



bilities again following Vogel et al. (1996 1, but con- 
strain the state space by only measuring transititions 
between fixed bucket lengths, rather than between 
the absolute position of each source word. The rela- 
tive frequency estimator for transitions is: 

c{b{s' - s)) 



Is 



(8) 



Es"e5c(K^"-^)) 

As above, c(-) denotes the count of an event, and 
here is a function that transforms the difference 
between two token positions into a coarser set of 
bins (for example, b may transform a distance of 



into its own bin, a distance of +1 into a different 
bin, a distance in the range [+2, +10] into a third 
bin, a difference of [—10, —2] into a fourth, etc.). 
Future work may include dynamically learning op- 
timizal bin sizes, much as boundaries are learned in 
the passage HMM. 

Second, we introduce the concept of a null source 
that can generate words in the target sentence. In the 
sentence-to-sentence translation setting, for a source 



sentence that is m words long, Och and Ney (2003 1 
add m corresponding NULL tokens, one for each 
source word position, to be able to adequately model 
transitions to, from and between NULL tokens in an 
alignment. For a source t/ocMwiewf that is ca. 100,000 
words long, this is clearly infeasible (since the com- 
plexity of even a single round of forward-backward 
inference is O(m^n), where n is the number of 
words in the target summary t). However, we can 
solve this problem by noting that the transition prob- 
ability as defined above is not measured between in- 
dividual words, but rather between the positions of 
coarser-grained chunks that contain each word; by 
coarsing the transitions to model the jump between 
a fixed set of B bins (where B <^ m), we effectively 
only need to add B null tokens, making inference 
tractable. As a final restriction, we disallow transi- 
tions between source state positions i and j where 
1^ ~ j | > In the experiments described in section 
g) T = 1000. 

Third, we expand the emission probabilities to 
allow the translation of a source word into a fixed 
set of synonyms (e.g., as derived from Roget's The- 
saurusj^ This expands the coverage of important 
lexical variants while still constraining the allowable 
emission space to a reasonable size. All synonyms 
of a word are available as potential "translations"; 
the exact translation probability (e.g., 7?purchase,buy) is 
learned during inference. 

4 Experiments 

To evaluate these two alignment methods and com- 
pare with past work, we evaluate on the downstream 
task of extractive book summarization. 



4.1 Data 

The available data includes 14,120 book plot sum- 
maries extracted from the November 2, 2012 
dump of English-language Wikipeditj^ and 3 1 ,393 
English-language books from Project Gutenberg]^ 
We restrict the book/summary pairs to only those 
where the full text of the book contains at least 
10,000 words and the paired abstract contains 
at least 100 words (stopwords and punctuation 
excluded). This results in a dataset of 439 
book/summary pairs, where the average book length 
is 43,223 words, and the average summary length is 
369 words (again, not counting stopwords and punc- 
tuation). 

The ratio between summaries and full books in 
this dataset is approximately 1.2%, much smaller 
than that used in previous work for any domain, 
even for past work involving literary novels: Ceylan 



( 2009 1 makes use of a collection of 3 1 books paired 
with relatively long summaries from SparkNotes, 
Cliff sNotes and GradeSaver, where the average 
summary length is 6,800 words. We focus instead 
on the more concise case, targeting summaries that 
distill an entire book into approximately 500 words. 

4.2 Discriminative summarization 

We follow a standard approach to discriminative 
summarization. All experiments described below 
use 10-fold cross validation, in which we partition 
the data into ten disjoint sets, train on nine of them 
and then test on the remaining held-out partition. 
Ten evaluations are conducted in total, with the re- 
ported accuracy being the average across all ten sets. 
First, all source books and paired summaries in the 
training set are aligned using one of the three unsu- 
pervised methods described above (Passage HMM, 
Token HMM, Jing 1999). 

Next, all of the sentences in the source side of 
the book/summary pairs are featurized; all sentences 
that have been aligned to a sentence in the summary 
are assiged a label of 1 (appearing in summary) and 
otherwise (not appearing in summary). Using this 
featurized representation, we then train a binary lo- 
gistic regression classifier with I2 regularization on 
the training data to learn which features are the most 



^http : / /www . gutenberg .org/ebooks/10581 



http : / /dumps ■ wikimedia ■ org/enwiki/ 



http : / / www . gutenberg . org 



indicative of a source sentence appearing in a sum- 
mary. Following previous work, we devise sentence- 
level features that can be readily computed in com- 
parison both with the document in which the sen- 
tence in found, and in comparison with the collec- 



tion of documents as whole (Yeh et al., 2005; Shen 



et al., 2007 1. All feature values are binary: 



• Sentence position within document, discretized 
into membership in each of ten deciles. (10 fea- 
tures.) 

• Sentence contains a salient name. We opera- 
tionalize "salient name" as the 100 capitalized 
words in a document with the highest TF-IDF 
score in comparison with the rest of the data; 
only non-sentence-initial tokens are used for 
calculate counts. (100 features.) 

• Contains lexical item x (x G most frequent 
10,000 words). This captures the tendency for 
some actions, such as kills, dies to be more 
likely to appear in a summary. (10,000 fea- 
tures.) 

• Contains the first mention of lexical item x 
{x £ most frequent 10,000 words). (10,000 fea- 
tures.) 

• Contains a word that is among the top [1,10], 
[1,100], [1,1000] words having the highest 
TF/IDF scores for that book. (3 features.) 

With a trained model and learned weights for all 
features, we next featurize each sentence in a test 
book according to the same set of features described 
above and predict whether or not it will appear in 
the summary. Sentences are then ranked by prob- 
ability and the top sentences are chosen to create a 
summary of 1,000 words. To create a summary, sen- 
tences are then ordered according to their position in 
the source document. 

5 Evaluation 

Document summarization has a standard (if imper- 



fect) evaluation in the ROUGE score (Lin and Hovy, 



2003 j), which, as an n-gram recall measure, stresses 
the ability of the candidate summary to recover the 
words in the reference. To evaluate the automati- 
cally generated summary, we calculate the ROUGE 



score between the generated summary and the held- 
out reference summary from Wikipedia for each 
book. We consider both ROUGE- 1 , which measures 
the overlap of unigrams, and ROUGE-2, which mea- 
sures bigram overlap. For the case of a single ref- 
erence translation, ROUGE-N is calculated as the 
following (where w ranges over all unigrams or bi- 
grams in the reference summary, depending on N, 
and c(-) is the count of the n-gram in the text). 



(9) 



Figure |2] lists the results of a 10-fold test on the 
439 available book/summary pairs. Both alignment 
models described above show a moderate improve- 
ment over the method of Jing et al. For comparison, 
we also present a baseline of simply choosing the 
first 1,000 words in the book as the summary. 



Model 


ROUGE- 1 


ROUGE-2 


Block HMM 


41.4 


6.2 


Word HMM 


41.3 


6.2 


Jing 1999 


40.7 


6.0 


First 1000 


38.0 


6.0 



Table 2: ROUGE summarization scores. 

How well does this method actually work in prac- 
tice, however, at the task of generating summaries? 
Manually inspecting the generated summaries re- 
veals that automatic summarization of books still 
has great room for improvement, for all alignment 
methods involved. Appendix A shows the sentences 
extracted as a summary for Heart of Darkness. 

Independent of the quality of the generated sum- 
maries on held-out test data, one practical benefit of 
training binary log-linear models is that the resulting 
feature weights are interpretable, providing a data- 
driven glimpse into the qualities of a sentence that 
make it conducive to appearing in human-created 
summary. Table |3] lists the 25 strongest features 
predicting inclusion in the summary (rank-averaged 
over all ten training splits). The presence of a name 
in a sentence is highly predictive, as is its position at 
the beginning of a book (decile 0) or at the very end 
(decile 8 and 9). The strongest lexical features illus- 
trate the importance of a character's persona, par- 
ticularly in their relation with others (father, son, 



etc.), as well as the natural importance of major life 
events (death). The importance of these features 
in the generated summary of Heart of Darkness is 
clear - nearly every sentence contains one name, and 
the most important plot point captured is indeed one 
such life event ("Mistah Kurtz - he dead.")- 



1. IS_NAME 

2. DECILE_0 

3. TF-IDF < 100 

4. DECILE_8 

5. mr. 

6. TF-IDF < 10 

7. father 

8. love 

9. son 

10. brother 

11. years 

12. young 

13. mother 

14. family 

15. DECILE_9 

16. daughter 

17. wife 

18. man 

19. boy 

20. life 

21. death 

22. house 

23. chapter 

24. child 

25. sir 

Table 3 : Strongest features predicting inclusion in a sum- 
mary. 

6 Conclusion 

We present here two new methods optimized for 
aligning the full text of books with comparatively 
much shorter summaries, where the assumptions of 
the possibility of an exact word or phrase align- 
ment may not always hold. While these methods 
perform competitively in a downstream evaluation, 
book summarization clearly remains a challenging 
task. Nevertheless, improved book/summary align- 
ments hold intrinsic value in shedding light on what 
features of a work are deemed "summarizable" by 
human editors, and may potentially be exploited by 
tasks beyond summarization as well. 



A Generated sununary for Heart of 
Darkness 

• " And this also , " said Marlow suddenly , " has been 
one of the dark places of the earth . " He was the only 
man of us who still " followed the sea . " The worst 
that could he said of him was that he did not represent 
his class . 

• No one took the trouble to grunt even ; and presently 
he said , very slow - " I was thinking of very old times 
, when the Romans first came here , nineteen hundred 
years ago - the other day .... Light came out of this 
river since - you say Knights ? 

• We looked on , waiting patiently - there was nothing 
else to do till the end of the flood ; but it was only after 
a long silence , when he said , in a hesitating voice , " 
I suppose you fellows remember I did once turn fresh 
- water sailor for a bit , " that we knew we were fated 
, before the ebb began to run , to hear about one of 
Marlow ' s inconclusive experiences . 

• I know the wife of a very high personage in the Ad- 
ministration , and also a man who has lots of influence 
with , ' etc . She was determined to make no end of 
fuss to get me appointed skipper of a river steamboat , 
if such was my fancy . 

• He shook hands , I fancy , murmured vaguely , was 
satisfied with my French . 

• I found nothing else to do but to offer him one of my 
good Swede ' s 

• Kurtz was ... I felt weary and irritable . 

• Kurtz was the best agent he had , an exceptional man , 
of the greatest importance to the Company ; therefore 
I could understand his anxiety . 

• I heard the name of Kurtz pronounced , then the words 
, ' take advantage of this unfortunate accident . ' One 

of the men was the manager . 

• Kurtz , ' I continued , severely , ' is General Manager 
, you won ' t have the opportunity . ' "He blew the 
candle out suddenly , and we went outside . 

• The approach to this Kurtz grubbing for ivory in the 
wretched bush was beset by as many dangers as though 
he had been an enchanted princess sleeping in a fabu- 
lous castle . 

• In a moment he came up again with a jump , possessed 
himself of both my hands , shook them continuously , 
while he gabbled : ' Brother sailor ... honour ... plea- 
sure ... delight ... introduce myself ... Russian ... son 

of an arch - priest ... Goverrmient of Tambov ... What 
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• Where ' s a sailor that does not smoke ? " " The pipe 
soothed him , and gradually I made out he had run 



away from school , had gone to sea in a Russian ship 
; ran away again ; served some time in Enghsh ships ; 
was now reconciled with the arch - priest . 

• " He informed me , lowering his voice , that it was 
Kurtz who had ordered the attack to be made on the 

steamer . 

• " We had carried Kurtz into the pilot - house : there 
was more air there . 

• Suddenly the manager ' s boy put his insolent black 
head in the doorway , and said in a tone of scathing 
contempt : " ' Mistah Kurtz - he dead . ' "All the 
pilgrims rushed out to see . 

• That is why 1 have remained loyal to Kurtz to the last , 
and even beyond , when a long time after I heard once 
more , not his own voice , but the echo of his magnif- 
icent eloquence thrown to me from a soul as translu- 
cently pure as a cliff of crystal . 

• Kurtz ' s knowledge of unexplored regions must have 
been necessarily extensive and pecuUar - owing to his 

great abilities and to the deplorable circumstances in 
which he had been placed : therefore - ' 1 assured him 
Mr. 

• ' There are only private letters . ' He withdrew upon 
some threat of legal proceedings , and I saw him no 
more ; but another fellow , calling himself Kurtz ' s 
cousin , appeared two days later , and was anxious to 
hear all the details about his dear relative ' s last mo- 
ments . 

• Incidentally he gave me to understand that Kurtz had 
been essentially a great musician . 

• 1 had no reason to doubt his statement ; and to this 
day 1 am unable to say what was Kurtz ' s profession 
, whether he ever had any - which was the greatest of 
his talents . 

• This visitor informed me Kurtz ' s proper sphere ought 
to have been politics ' on the popular side . ' He had 
furry straight eyebrows , bristly hair cropped short , an 
eyeglass on a broad ribbon , and , becoming expansive , 
confessed his opinion that Kurtz really couldn ' t write 
a bit - ' but heavens ! how that man could talk . 

• All that had been Kurtz ' s had passed out of my hands 
: his soul , his body , his station , his plans , his ivory , 
his career . 

• And , by Jove ! the impression was so powerful that 
for me , too , he seemed to have died only yesterday - 
nay , this very minute . 

• He had given me some reason to infer that it was his 
impatience of comparative poverty that drove him out 
there . " ' ... Who was not his friend who had heard 
him speak once ? ' she was saying . 



• Would they have fallen , I wonder , if I had rendered 
Kurtz that justice which was his due ? 
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